Eecient Algorithms for Lempel-ziv Encoding
نویسندگان
چکیده
We consider several basic problems for texts and show that if the input texts are given by their Lempel-Ziv codes then the problems can be solved deterministically in polynomial time in the case when the original (uncompressed) texts are of exponential size. The growing importance of massively stored information requires new approaches to algorithms for compressed texts without decompressing. Denote by LZ(w) the version of a string w produced by Lempel-Ziv encoding algorithm. For given compressed strings LZ(T), LZ(P) we give the rst known deterministic polynomial time algorithms to compute compressed representations of the set of all occurrences of the pattern P in T, all periods of T, all palindromes of T, and all squares of T. Then we consider several classical language recognition problems: regular language recognition: given LZ(T) and a language L described by a regular expression, test if T 2 L, extended regular language recognition: given LZ(T) and a language L described by a LZ-compressed regular expression, test if T 2 L, the alphabet is unary, context-free language recognition: given LZ(T) and a language L described by a context-free grammar, test if T 2 L, the alphabet is unary. We show that the rst recognition problem has a polynomial time algorithm and the other two problems are N P-hard. We show also that the LZ encoding can be computed on-line in polynomial time delay and small space (i.e. proportional to the size of the compressed text). Also the compressed representation of a pattern-matching automaton for the compressed pattern is computed in polynomial time.
منابع مشابه
LZW Chromosome Encoding in Estimation of Distribution Algorithms
Estimation of distribution algorithm (EDA) can solve more complicated problems than its predecessor (Genetic Algorithm). EDA uses various methods to probabilistically model a group of highly fit individuals. Calculating the model in sophisticated EDA is very time consuming. To reduce the model building time, the authors propose compressed chromosome encoding. A chromosome is encoded using a for...
متن کاملA General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text
We address in this paper the problem of string matching on Lempel-Ziv compressed text. The goal is to search a pattern in a text without uncompressing. This is a highly relevant issue, since it is essential to have compressed text databases where eecient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts th...
متن کاملFaster Compact On-Line Lempel-Ziv Factorization
We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in O(N logN) time and uses only O(N log σ) bits of working space, where N is the length of the string and σ is the size of the alphabet. This is a notable improvement compared to the performance of previous on-line algorithms using the same order of working space but running in either O(N log3 N)...
متن کاملText Compression Algorithms - a Comparative Study
Data Compression may be defined as the science and art of the representation of information in a crisply condensed form. For decades, Data compression has been one of the critical enabling technologies for the ongoing digital multimedia revolution. There are a lot of data compression algorithms which are available to compress files of different formats. This paper provides a survey of different...
متن کاملRandomized Eecient Algorithms for Compressed Strings: the Finger-print Approach
Denote by LZ(w) the coded form of a string w produced by Lempel-Ziv encoding algorithm. We consider several classical algo-rithmic problems for texts in the compressed setting. The rst of them is the equality-testing: given LZ(w) and integers i; j; k test the equality: wi: : : i + k] = wj : : : j + k]. We give a simple and eecient randomized algorithm for this problem using the nger-printing id...
متن کامل